Evaluation of Odor Prediction Model Performance and Variable Importance according to Various Missing Imputation Methods
نویسندگان
چکیده
The aim of this study is to ascertain the most suitable model for predicting complex odors using odor substance data that has a small number and large missing data. First, we compared removal imputation methods, method imputing was found be more effective. Then, in order recommend model, created total 126 models (missing imputation: single imputation, multiple imputations, K-nearest neighbor imputation; preprocessing: standardization, principal component analysis, partial least square; predictive method: regression, machine learning, deep learning) them R2 mean absolute error (MAE) values. Finally, investigated variable importance best prediction model. results identified as combination multivariate Bayesian ridge method, standardization preprocessing, an extremely randomized tree method. Among compounds, Methyl mercaptan, acetic acid, dimethyl sulfide were important compounds odors.
منابع مشابه
Performance evaluation of different estimation methods for missing rainfall data
There are numerous methods to estimate missing values of which some are used depending on the data type and regional climatic characteristics. In this research, part of the monthly precipitation data in Sarab synoptic station, east Azerbaijan province, Iran was randomly considered missing values. In order to study the effectiveness of various methods to estimate missing data, by seven classic s...
متن کاملVariable Importance and Prediction Methods for Longitudinal Problems with Missing Variables
We present prediction and variable importance (VIM) methods for longitudinal data sets containing continuous and binary exposures subject to missingness. We demonstrate the use of these methods for prognosis of medical outcomes of severe trauma patients, a field in which current medical practice involves rules of thumb and scoring methods that only use a few variables and ignore the dynamic and...
متن کاملPerformance Evaluation of L1-norm-based Microarray Missing Value Imputation
l1-norm minimization was utilized in the imputation of microarray missing values, which is an important procedure in bioinformatics experiments. Two l1 approaches, based on the framework of local least squares (LLS) and iterative biclusterbased least squares (bicluster-iLLS) respectively, were employed. Imputed datasets of the l1 approaches were compared with those of traditional l2 methods. Th...
متن کاملPerformance Evaluation of Missing-Value Imputation Clustering Based on a Multivariate Gaussian Mixture Model
BACKGROUND It is challenging to deal with mixture models when missing values occur in clustering datasets. METHODS AND RESULTS We propose a dynamic clustering algorithm based on a multivariate Gaussian mixture model that efficiently imputes missing values to generate a "pseudo-complete" dataset. Parameters from different clusters and missing values are estimated according to the maximum likel...
متن کاملAnalyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods
ÐMissing data are often encountered in data sets used to construct effort prediction models. Thus far, the common practice has been to ignore observations with missing data. This may result in biased prediction models. In this paper, we evaluate four missing data techniques (MDTs) in the context of software cost modeling: listwise deletion (LD), mean imputation (MI), similar response pattern im...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Applied sciences
سال: 2022
ISSN: ['2076-3417']
DOI: https://doi.org/10.3390/app12062826